Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Support using pandas nullable types #77

Merged
merged 1 commit into from
Apr 3, 2024

Conversation

AndyBryson
Copy link
Contributor

@AndyBryson AndyBryson commented Mar 5, 2024

Closes #76

Proposed Changes

Use pandas to decide what is NaN to allow us to have nullable bools, ints and strings.

Profiling

With a real world dataset, 10_000 lines, 156 columns

exiting method, no pandas null

1.86 s ± 26.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

new method, no pandas null

1.7 s ± 103 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

new method, with pandas null

1.51 s ± 25.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Checklist

  • CHANGELOG.md updated
  • Rebased/mergeable
  • A test has been added if appropriate
  • Tests pass
  • Commit messages are conventional
  • Sign CLA (if not already signed)

@bednar
Copy link
Member

bednar commented Mar 14, 2024

@AndyBryson thanks for your PR, I will review your changes ASAP

@bednar bednar self-requested a review March 14, 2024 07:23
Copy link
Member

@bednar bednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @AndyBryson,

thanks for your PR 👍. Can you please rebase your sources with main branch and satisfy our checklist?

image

Best

@bednar bednar changed the title Support using pandas nullable types fix: Support using pandas nullable types Mar 18, 2024
@AndyBryson
Copy link
Contributor Author

Hi @bednar. That's mostly done. There is no CHANGELOG.md, so I've not updated it.

Copy link
Member

@bednar bednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndyBryson I've add the CHANGELOG.md to main branch.

You can add something like:

### Bugfix

1. [#77](https://github.com/InfluxCommunity/influxdb3-python/pull/77): Support using pandas nullable types

Copy link
Member

@bednar bednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please rebase your sources instead of merging? This approach would make it easier for us to review your changes.

@AndyBryson
Copy link
Contributor Author

Could you please rebase your sources instead of merging? This approach would make it easier for us to review your changes.

Hopefully that's all done now.

Copy link
Member

@bednar bednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for your PR, there are a few changes that sound reasonable from our POV:

Comment on lines 16 to 20
try:
from ...extras import pd

def _not_nan(x):
return not pd.isna(x)

except ImportError:
pd = None

def _not_nan(x):
return x == x

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplify to because the pandas should be present:

Suggested change
try:
from ...extras import pd
def _not_nan(x):
return not pd.isna(x)
except ImportError:
pd = None
def _not_nan(x):
return x == x
def _not_nan(x):
from ...extras import pd
return not pd.isna(x)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 204 to 208
if issubclass(value.type, np.integer):
field_value = f"{sep}{key_format}={{{val_format}}}i"
elif issubclass(value.type, np.bool_):
field_value = f'{sep}{key_format}={{{val_format}}}'
elif issubclass(value.type, np.floating):
if null_columns.iloc[index]:
field_value = f"""{{"" if math.isnan({val_format}) else f"{sep}{key_format}={{{val_format}}}"}}"""
field_value = f"""{{"" if pd.isna({val_format}) else f"{sep}{key_format}={{{val_format}}}i"}}"""
else:
field_value = f"{sep}{key_format}={{{val_format}}}i"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplify to:

 if issubclass(value.type, np.integer) or issubclass(value.type, np.floating) or issubclass(value.type, np.bool_):  # noqa: E501
                suffix = 'i' if issubclass(value.type, np.integer) else ''
                if null_columns.iloc[index]:
                    field_value = f"""{{"" if pd.isna({val_format}) else f"{sep}{key_format}={{{val_format}}}{suffix}"}}"""  # noqa: E501
                else:
                    field_value = f"{sep}{key_format}={{{val_format}}}{suffix}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Allow nullable types that aren't floats. This means that we can upload
data frames that have int/bool columns that have empty cells and still
maintain the correct type.
Copy link
Member

@bednar bednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for this PR 👍

LGTM 🚀

@bednar bednar merged commit 91e8b47 into InfluxCommunity:main Apr 3, 2024
3 checks passed
@bednar bednar added this to the 0.4.0 milestone Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrameSerializer doesn't support pandas NullableDataTypes
2 participants